AITopics | data asset

Collaborating Authors

data asset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Semantic Modelling of Organizational Knowledge as a Basis for Enterprise Data Governance 4.0 -- Application to a Unified Clinical Data Model

Oliveira, Miguel AP, Manara, Stephane, Molé, Bruno, Muller, Thomas, Guillouche, Aurélien, Hesske, Lysann, Jordan, Bruce, Hubert, Gilles, Kulkarni, Chinmay, Jagdev, Pralipta, Berger, Cedric R.

arXiv.org Artificial IntelligenceNov-23-2023

Individuals and organizations cope with an always-growing amount of data, which is heterogeneous in its contents and formats. An adequate data management process yielding data quality and control over its lifecycle is a prerequisite to getting value out of this data and minimizing inherent risks related to multiple usages. Common data governance frameworks rely on people, policies, and processes that fall short of the overwhelming complexity of data. Yet, harnessing this complexity is necessary to achieve high-quality standards. The latter will condition any downstream data usage outcome, including generative artificial intelligence trained on this data. In this paper, we report our concrete experience establishing a simple, cost-efficient framework that enables metadata-driven, agile and (semi-)automated data governance (i.e. Data Governance 4.0). We explain how we implement and use this framework to integrate 25 years of clinical study data at an enterprise scale in a fully productive environment. The framework encompasses both methodologies and technologies leveraging semantic web principles. We built a knowledge graph describing avatars of data assets in their business context, including governance principles. Multiple ontologies articulated by an enterprise upper ontology enable key governance actions such as FAIRification, lifecycle management, definition of roles and responsibilities, lineage across transformations and provenance from source systems. This metadata model is the keystone to data governance 4.0: a semi-automatised data management process that considers the business context in an agile manner to adapt governance constraints to each use case and dynamically tune it based on business changes.

data asset, governance, knowledge, (15 more...)

arXiv.org Artificial Intelligence

2311.02082

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Middle East > Malta > Northern Region > Western District > Attard (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Quality (1.00)
Information Technology > Communications > Web > Semantic Web (1.00)
(2 more...)

Add feedback

Open Data on GitHub: Unlocking the Potential of AI

Roman, Anthony Cintron, Xu, Kevin, Smith, Arfon, Vega, Jehu Torres, Robinson, Caleb, Ferres, Juan M Lavista

arXiv.org Artificial IntelligenceJun-9-2023

GitHub is the world's largest platform for collaborative software development, with over 100 million users. GitHub is also used extensively for open data collaboration, hosting more than 800 million open data files, totaling 142 terabytes of data. This study highlights the potential of open data on GitHub and demonstrates how it can accelerate AI research. We analyze the existing landscape of open data on GitHub and the patterns of how users share datasets. Our findings show that GitHub is one of the largest hosts of open data in the world and has experienced an accelerated growth of open data assets over the past four years. By examining the open data landscape on GitHub, we aim to empower users and organizations to leverage existing open datasets and improve their discoverability -- ultimately contributing to the ongoing AI revolution to help address complex societal issues. We release the three datasets that we have collected to support this analysis as open datasets at https://github.com/github/open-data-on-github.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2306.06191

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California > Santa Clara County > Stanford (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Greece > Attica > Athens (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.47)

Add feedback

Metadata driven development realises "smart manufacturing" of data ecosystems – blog 3 - Solita Data

#artificialintelligenceJan-23-2023, 16:15:54 GMT

This is the third part of the blog series. The 1st blog focused on the maturity model and explained how the large monolith data warehouses were created. The 2nd blog focused on metadata driven development or "smart manufacturing" of data ecosystems. This 3rd blog will talk about reverse engineering or how existing data assets can be discovered to accelerate the development of new data products. Companies have increasing pressure to start addressing the data silos to reduce cost, improve agility & accelerate innovation, but they struggle to deliver value from their data assets. Many companies have hundreds of systems, containing thousands of databases hundreds of thousands of tables, millions of columns, and millions of lines of code across many different technologies. The starting point is a "data spaghetti" that nobody knows well.

artificial intelligence, data asset, data mining, (17 more...)

#artificialintelligence

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

The many layers of data lineage. What can we learn from google maps to…

#artificialintelligenceJan-15-2023, 02:15:43 GMT

Having a map showing how data evolves from its sources to its destination is the dream of any organisation. Like the gold rush, everyone is after that tool connecting together columns, tables and dashboards within the warehouse. But like gold, this visualisation has been always considered a privilege in the data ecosystem. Defining the lineage has been a manual task not accessible to everyone. Usually, only the ones working daily with the data transformation processes are aware of the actual flow of data -- and typically this lineage is a mix between what's in their minds, documented information and digging into different tools' metadata.

artificial intelligence, lineage, warehouse, (18 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence (0.70)

Add feedback

Director, Data Engineering at Visa - Bengaluru, India

#artificialintelligenceDec-11-2022, 06:10:54 GMT

Visa is a world leader in digital payments, facilitating more than 215 billion payments transactions between consumers, merchants, financial institutions and government entities across more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable and secure payments network, enabling individuals, businesses and economies to thrive. When you join Visa, you join a culture of purpose and belonging – where your growth is priority, your identity is embraced, and the work you do matters. We believe that economies that include everyone everywhere, uplift everyone everywhere. Your work will have a direct impact on billions of people around the world – helping unlock financial access to enable the future of money movement.

data engineering, platform, visa, (8 more...)

#artificialintelligence

Country: Asia > India > Karnataka > Bengaluru (0.40)

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.35)

Add feedback

How to Democratize AI/ML and Data Science with AI-generated Synthetic Data - KDnuggets

#artificialintelligenceNov-30-2022, 23:45:44 GMT

More and more people across organizations are expected to work with data and to do so safely without breaking or leaking anything. Synthetic data generation is a solution that allows citizen data scientists and auto ML users to quickly and safely create and use business-critical data assets. Letting go of production data is a hard sell for data scientists and engineers privileged enough to have unrestricted access to their companies' most valuable data assets. Old habits are hard to change, but that doesn't mean they shouldn't. More and more companies are creating synthetic data repositories, where curated synthetic data assets replace privacy-sensitive, messy, and biased production data access. Benefits go beyond democratizing data access, and even those with privileged data access build synthetic data generators into their workflows.

ai-generated synthetic data, synthetic data, synthetic data generator, (10 more...)

#artificialintelligence

Industry: Information Technology > Security & Privacy (0.52)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

What is Data Governance? Top Data Governance Tools for Data Science and Machine Learning Research in 2022

#artificialintelligenceNov-11-2022, 23:25:27 GMT

The process of developing internal data standards and enacting rules governing who has access to data and how it is utilized for analytical applications and business operations is known as data governance. A good data governance program guarantees that data is reliable, consistent, and accessible and that its use complies with applicable rules and regulations regarding data protection. In addition to master data management (MDM) projects, it frequently includes data quality improvement initiatives. Software of this type offers features that facilitate the formulation of data governance policies, the construction of business glossaries and data catalogs, data mapping and classification, workflow management, collaboration, and process documentation. Software for data governance can be used in conjunction with MDM, metadata management, and data quality solutions. Data governance aims to promote confident decisions supported by solid data resources. Building policies that define data ownership, duties, and delegates are the goal of data governance.

data catalog, data governance, platform, (12 more...)

#artificialintelligence

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Quality (1.00)
Information Technology > Data Science > Data Integration (1.00)
(2 more...)

Add feedback

Senior Analytics Engineer

#artificialintelligenceOct-27-2022, 06:05:38 GMT

Pango Group, an Aura Company, helps customers monitor, manage, and protect against the risks associated with their identities and personal information in a digital world. Backed by WndrCo, Warburg Pincus and General Catalyst, Pango Group is dedicated to creating the world's most comprehensive portfolio of industry-leading cybersecurity solutions. Our vision is to become THE go-to resource for every cyber protection need individuals may face - today and in the future. Pango Group, an Aura Company, helps customers monitor, manage, and protect against the risks associated with their identities and personal information in a digital world. Backed by WndrCo, Warburg Pincus and General Catalyst, Pango Group is dedicated to creating the world's most comprehensive portfolio of industry-leading cybersecurity solutions.

cyber protection, pango group, senior analytic engineer, (7 more...)

#artificialintelligence

Country: North America > United States > California (0.05)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.31)

Add feedback

Sr Data Engineer

#artificialintelligenceOct-23-2022, 21:13:41 GMT

As the world's leader in digital payments technology, Visa's mission is to connect the world through the most creative, reliable and secure payment network - enabling individuals, businesses, and economies to thrive. Our advanced global processing network, VisaNet, provides secure and reliable payments around the world, and is capable of handling more than 65,000 transaction messages a second. The company's dedication to innovation drives the rapid growth of connected commerce on any device, and fuels the dream of a cashless future for everyone, everywhere. As the world moves from analog to digital, Visa is applying our brand, products, people, network and scale to reshape the future of commerce. At Visa, your individuality fits right in.

platform, sr data engineer, visa, (8 more...)

#artificialintelligence

Industry:

Banking & Finance (0.95)
Information Technology (0.58)
Health & Medicine > Therapeutic Area > Immunology (0.36)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.77)

Add feedback

A Dagster Crash Course

#artificialintelligenceOct-13-2022, 18:50:27 GMT

Hey - I'm the head of engineering at Elementl, the company that builds Dagster. This post is my take on a crash-course introduction to Dagster. And if you want to support the Dagster Open Source project, be sure to star our Github repo. Dagster is a data orchestrator. Think of Dagster as a framework for building data pipelines, similar to how Django is a framework for building web apps.

dagster, pipeline, repository, (14 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.91)

Technology:

Information Technology > Communications (0.50)
Information Technology > Data Science (0.49)
Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback